Search CORE

112 research outputs found

Whole-Body Motion Capture and Beyond: From Model-Based Inference to Learning-Based Regression

Author: Huang Yinghao
Publication venue: Universität Tübingen
Publication date: 01/01/2022
Field of study

Herkömmliche markerlose Motion Capture (MoCap)-Methoden sind zwar effektiv und erfolgreich, haben aber mehrere Einschränkungen: 1) Sie setzen ein charakterspezifi-sches Körpermodell voraus und erlauben daher keine vollautomatische Pipeline und keine Verallgemeinerung über verschiedene Korperformen; 2) es werden keine Objekte verfolgt, mit denen Menschen interagieren, während in der Realität die Interaktion zwischen Menschen und Objekten allgegenwärtig ist; 3) sie sind in hohem Maße von ausgeklügelten Optimierungen abhängig, die eine gute Initialisierung und starke Prioritäten erfordern. Dieser Prozess kann sehr zeitaufwändig sein. In dieser Arbeit befassen wir uns mit allen oben genannten Problemen. Zunächst schlagen wir eine vollautomatische Methode zur genauen 3D-Rekonstruktion des menschlichen Körpers aus RGB-Videos mit mehreren Ansichten vor. Wir verarbeiten alle RGB-Videos vor, um 2D-Keypoints und Silhouetten zu erhalten. Dann passen wir modell in zwei aufeinander folgenden Schritten an die 2D-Messungen an. In der ersten Phase werden die Formparameter und die Posenparameter der SMPL nacheinander und bildweise geschtäzt. In der zweiten Phase wird eine Reihe von Einzelbildern gemeinsam mit der zusätzlichen DCT-Priorisierung (Discrete Cosine Transformation) verfeinert. Unsere Methode kann verschiedene Körperformen und schwierige Posen ohne menschliches Zutun verarbeiten. Dann erweitern wir das MoCap-System, um die Verfolgung von starren Objekten zu unterstutzen, mit denen die Testpersonen interagieren. Unser System besteht aus 6 RGB-D Azure-Kameras. Zunächst werden alle RGB-D Videos vorverarbeitet, indem Menschen und Objekte segmentiert und 2D-Körpergelenke erkannt werden. Das SMPL-X Modell wird hier eingesetzt, um die Handhaltung besser zu erfassen. Das SMPL-XModell wird in 2D-Keypoints und akkumulierte Punktwolken eingepasst. Wir zeigen, dass die Körperhaltung wichtige Informationen für eine bessere Objektverfolgung liefert. Anschließend werden die Körper- und Objektposen gemeinsam mit Kontakt- und Durch-dringungsbeschrankungen optimiert. Mit diesem Ansatz haben wir den ersten Mensch-Objekt-Interaktionsdatensatz mit natürlichen RGB-Bildern und angemessenen Körper und Objektbewegungsinformationen erfasst. Schließlich präsentieren wir das erste praktische, leichtgewichtige MoCap-System, das nur 6 Inertialmesseinheiten (IMUs) benötigt. Unser Ansatz basiert auf bi-direktionalen rekurrenten neuronalen Netzen (Bi-RNN). Das Netzwerk soll die zeitliche Abhängigkeit besser ausnutzen, indem es vergangene und zukünftige Teilmessungen der IMUs zu- sammenfasst. Um das Problem der Datenknappheit zu lösen, erstellen wir synthetische Daten aus archivierten MoCap-Daten. Insgesamt läuft unser System 10 Mal schneller als die Optimierungsmethode und ist numerisch genauer. Wir zeigen auch, dass es möglich ist, die Aktivität der Testperson abzuschätzen, indem nur die IMU Messung der Smart-watch, die die Testperson trägt, betrachtet wird. Zusammenfassend lässt sich sagen, dass wir die markerlose MoCap-Methode weiter-entwickelt haben, indem wir das erste automatische und dennoch genaue System beisteuerten, die MoCap-Methoden zur Unterstützung der Verfolgung starrer Objekte erweiterten und einen praktischen und leichtgewichtigen Algorithmus mit 6 IMUs vorschlugen. Wir glauben, dass unsere Arbeit die markerlose MoCap billiger und praktikabler macht und somit den Endnutzern fur den taglichen Gebrauch näher bringt.Though effective and successful, traditional marker-less Motion Capture (MoCap) methods suffer from several limitations: 1) they presume a character-specific body model, thus they do not permit a fully automatic pipeline and generalization over diverse body shapes; 2) no objects humans interact with are tracked, while in reality interaction between humans and objects is ubiquitous; 3) they heavily rely on a sophisticated optimization process, which needs a good initialization and strong priors. This process can be slow. We address all the aforementioned issues in this thesis, as described below. Firstly we propose a fully automatic method to accurately reconstruct a 3D human body from multi-view RGB videos, the typical setup for MoCap systems. We pre-process all RGB videos to obtain 2D keypoints and silhouettes. Then we fit the SMPL body model into the 2D measurements in two successive stages. In the first stage, the shape and pose parameters of SMPL are estimated frame-wise sequentially. In the second stage, a batch of frames are refined jointly with an extra DCT prior. Our method can naturally handle different body shapes and challenging poses without human intervention. Then we extend this system to support tracking of rigid objects the subjects interact with. Our setup consists of 6 Azure Kinect cameras. Firstly we pre-process all the videos by segmenting humans and objects and detecting 2D body joints. We adopt the SMPL-X model here to capture body and hand pose. The model is fitted to 2D keypoints and point clouds. Then the body poses and object poses are jointly updated with contact and interpenetration constraints. With this approach, we capture a novel human-object interaction dataset with natural RGB images and plausible body and object motion information. Lastly, we present the first practical and lightweight MoCap system that needs only 6 IMUs. Our approach is based on Bi-directional RNNs. The network can make use of temporal information by jointly reasoning about past and future IMU measurements. To handle the data scarcity issue, we create synthetic data from archival MoCap data. Overall, our system runs ten times faster than traditional optimization-based methods, and is numerically more accurate. We also show it is feasible to estimate which activity the subject is doing by only observing the IMU measurement from a smartwatch worn by the subject. This not only can be useful for a high-level semantic understanding of the human behavior, but also alarms the public of potential privacy concerns. In summary, we advance marker-less MoCap by contributing the first automatic yet accurate system, extending the MoCap methods to support rigid object tracking, and proposing a practical and lightweight algorithm via 6 IMUs. We believe our work makes marker-less and IMUs-based MoCap cheaper and more practical, thus closer to end-users for daily usage

Publikationsserver der Universität Tübingen

Visible light-enhanced photothermal CO2 hydrogenation over Pt/Al2O3 catalyst

Author: Doronkin Dmitry E.
Grunwaldt Jan-Dierk
Huang Zeai
Ye Yinghao
Zhao Ziyan
Zhou Ying
Publication venue: Elsevier
Publication date: 07/01/2020
Field of study

KITopen

Real-time moving object classification with automatic scene division

Author: Kaiqi Huang
Tieniu Tan
Yinghao Cai
Zhaoxiang Zhang
Publication venue
Publication date: 01/01/2007
Field of study

ABSTRACT We address the problem of moving object classification. Our aim is to classify moving objects of traffic scene videos into pedestrians, bicycles and vehicles. Instead of supervised learning and manual labeling of large training samples, our classifiers are initialized and refined online automatically. With efficient features extracted and organized, the approach can be real-time and achieve high classification accuracy. Once the view or scene changes detected, the algorithm can automatically refine the classifiers and adapt them to new environments. Experimental results demonstrate the effectiveness and robustness of the proposed approach

CiteSeerX

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Author: Benetos Emmanouil
Chen Wenhu
Deng Zihao
Guo Rongchen
Huang Wenhao
Liu Yudong
Ma Yinghao
Zhang Ge
Publication venue
Publication date: 15/09/2023
Field of study

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains relatively unexplored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with the frozen LLaMA language model, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from MusicCaps, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones

arXiv.org e-Print Archive

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

Author: Gao Yang
Huang Heyan
Li Jiawei
Li Yinghao
Liu Runheng
Liu Yuhang
Sun Huashan
Yang Yizhe
Publication venue
Publication date: 28/10/2023
Field of study

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.Comment: Working in progres

arXiv.org e-Print Archive

Experimental Study on Stress and Strain Characteristics of Solidified Clay under Seawater Condition

Author: Kai XU
Xiaomei LI
Yinghao HUANG
Zhengyin CAI
Zhiqiang WU
Zhizhou GENG
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

This paper presents the results of a laboratory study on the stress-strain relationship of solidified clay formed in seawater corrosion condition. An automatic triaxial apparatus was used and the axial stress and strain was monitored continuously. The dry density was 1.0g/cm3, the cement contents were 4, 6, 8 and 10% by weight of dry soil particles, and the curing time was 28, 60 and 90 days respectively. Test results indicate that the stress strain relationship of cemented clay was affected by soil density, cement content and curing period. A behaviour of strain hardening to strain softening occurred with the increase of cement content. Strong structure will form in cemented clay when the admixture content is 10% or more. The increase in strength of the solidified foundation is resulted from the increase in internal friction angle and cohesive force. The cohesive force increases obviously with the increase of the cement content and the curing age, but the change of internal friction angle is not pronounced after reaching a certain value

Directory of Open Access Journals

Phosphorus recovery from anaerobically digested liquor of screenings

Author: EM Jordaan
EV Münch
G-L Zang
Haiming Huang
I. Çelen
J.A. Wilsenach
L PASTOR
L. Pastor
M Yoshino
Md. Imtiaj Ali
NA Booker
Nathan O. Nelson
P Battistoni
PJ Talboys
Sampriti Kataki
YingHao Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Phosphorus is a limited resource which is predicted to get exhausted at some point during the twenty-first century. However, it is present in wastewaters at concentrations that come close to supplying the nation’s annual requirements for fertiliser. Many papers have addressed the recovery of phosphorus as struvite (magnesium ammonium phosphate hexahydrate) from different types of waste while the most prominent usage of struvite is as a slow-release fertiliser, suitable as a replacement for chemical fertiliser, for agricultural application. In this study, screenings produced during the wastewater treatment process were anaerobically digested to obtain anaerobically digested liquor which was subsequently used for phosphorus recovery in the form of struvite. This was carried out at different concentrations of dry solids. The amount of struvite potential was calculated theoretically using molar ratio calculations of 1:1:1 (Mg:N:P). From the results, it was found that the digestate is high in phosphorus content and can be recovered up to 41%. For struvite yield, 0.27,kg of struvite can be recovered from each kg dry solids of screenings from 3% of dry solids. Screenings thus prove a valuable source of additional phosphorus which current disposal practices fail to exploit

Crossref

UMS Institutional Repository

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Author: Benetos Emmanouil
Chen Wenhu
Chen Xingran
Dannenberg Roger
Fu Jie
Guo Yike
Gyenge Norbert
Huang Wenhao
Li Yizhi
Lin Chenghua
Liu Ruibo
Ma Yinghao
Ragni Anton
Shi Yemin
Xia Gus
Yin Hanzhi
Yuan Ruibin
Zhang Ge
Publication venue
Publication date: 31/05/2023
Field of study

Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified a superior combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). These teachers effectively guide our student model, a BERT-style transformer encoder, to better model music audio. In addition, we introduce an in-batch noise mixture augmentation to enhance the representation robustness. Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attains state-of-the-art (SOTA) overall scores. The code and models are online: https://github.com/yizhilll/MERT

arXiv.org e-Print Archive

The Natural Compound Myricetin Effectively Represses the Malignant Progression of Prostate Cancer by Inhibiting PIM1 and Disrupting the PIM1/CXCR4 Interaction

Author: Bo Yang
Chao Zhang
Chen Ye
Depei Kong
Guangan Xiao
Hai Huang
Haisong Tan
Qinqin Tian
Qixiang Song
Tie Zhou
Xiaoyuan Zi
Yang Wang
Yinghao Sun
Yunjie Song
Publication venue: 'S. Karger AG'
Publication date: 01/07/2018
Field of study

Background/Aims: Natural compounds are a promising resource for anti-tumor drugs. Myricetin, an abundant flavonoid found in the bark and leaves of bayberry, shows multiple promising anti-tumor functions in various cancers. Methods: The cytotoxic, pro-apoptotic, and anti-metastatic effects of myricetin on prostate cancer cells were investigated in both in vitro and in vivo studies. Short-hairpin RNA knockdown of the proviral integration site for Moloney murine leukemia virus-1 (PIM1), pull-down and co-immunoprecipitation assays, and an intracellular Ca2+ flux assay were used to investigate the potential underlying mechanism of myricetin. ONCOMINE database data mining and immunohistochemical analysis of prostate cancer tissues were used to evaluate the expression of PIM1 and CXCR4, as well as the correlation between PIM1 and CXCR4 expression and the clinicopathologic characteristics and prognoses of prostate cancer patients. Results: Myricetin exerted selective cytotoxic, pro-apoptotic, and anti-metastatic effects on prostate cancer cells by inhibiting PIM1 and disrupting the PIM1/CXCR4 interaction. Moreover, PIM1 and CXCR4 were coexpressed and associated with aggressive clinicopathologic traits and poor prognosis in prostate cancer patients. Conclusion: These results offer preclinical evidence for myricetin as a potential chemopreventive and therapeutic agent for precision medicine tailored to prostate cancer patients characterized by concomitant elevated expression of PIM1 and CXCR4

Directory of Open Access Journals

Visible light-enhanced photothermal CO2 hydrogenation over Pt/Al2O3 catalyst

Author: Boubnov
Chen
Daza
Dietz
Dmitry E. Doronkin
Ertl
Farias
Golibrzuch
Iwasita
Jan-Dierk Grunwaldt
Jia
Kim
Kim
Kwak
Lawrenz
Li
Li
Li
Lian
Marimuthu
Niu
Nørskov
Porosoff
Porosoff
Qiao
Saeidi
Sandberg
Sarina
Scalbert
Shan
Siddiqui
Stojadinović
Upadhye
Wang
Wang
Wang
Wang
Wang
Wang
Wasylenko
White
Wu
Ying Zhou
Yinghao Ye
Yoshida
Zeai Huang
Zhang
Zhang
Zhao
Zhou
Zhou
Ziyan Zhao
Publication venue: Elsevier
Publication date: 07/01/2020
Field of study

Crossref

KITopen